Transcriptional and functional motifs defining renal function revealed by single-nucleus RNA sequencing

Significance We performed a single-nucleus RNA sequencing study of the adult fly kidney, identifying 11 distinct clusters that we validated by gene markers. We characterized the roles of transcription factors involved in stem cell regeneration and metabolism, as well as genes that regulate the unusual cell shape of stellate cells. The dataset also provides a systems-level view of the organization and physiological roles of the tubules. Finally, we performed a cross-species analysis that allowed a comparison of the fly kidney cell types with mouse kidney cell types, as well as planarian protonephridia. This study provides a comprehensive resource for studying the insect kidney.

Dataset processing. The quality of the raw sequencing data was checked by FastQC software. The raw sequencing data were processed by cellranger count pipeline to generate the single cell matrix for each sample. The single cell matrix was analyzed by the Seurat package and Harmony was used for batch correction. The Malpighian tubules and nephrocytes samples were processed and clustered separately, and then were merged by filtering out unrelated cell clusters. The processed result of the gene expression matrix was used for the downstream data analysis.
To facilitate mining of the datasets, we developed a visualization web portal (https://www.flyrnai.org/scRNA/kidney/) that allows users to query the expression of any genes of interest in different cell types and to compare the expression of any 2 genes in individual cells. This Malpighian tubule dataset can also be mined at Fly Cell Atlas (https://flycellatlas.org/) along with other datasets generated by the FCA consortium (1).
Gene ontology (GO) analysis. Gene ontology (GO) analysis was performed by clusterProfiler. The marker genes identified using Seurat were used for GO analysis. The strength of enrichment was calculated as negative of log10(p-value), which was used to plot the barplot.
Cross-tissue analysis. Cross-tissue analysis data are from the Malpighian tubules processed dataset and midgut dataset (accession code: GSE120537). Before merging the two datasets, the number of Malpighian tubules cells was downsampled to the same size as the midgut dataset. The top markers of renal stem cells were calculated by comparing the renal stem cells with the rest of the merged dataset without intestinal stem cells. The top markers of intestinal stem cells were calculated by comparing the intestinal stem cells with the rest of the merged dataset without renal stem cells. Results of the comparison are visualized on a Venn diagram.
Pseudotemporal ordering of cells using Monocle3. Processed Malpighian tubules dataset was analyzed using Monocle3 for pseudotemporal ordering. The state representing lower ureter principal cells was chosen as the starting time point. The Ridge plot was generated by extracting the cell clustering and pseudotime information and then visualized by Seurat RidgePlot function. The gene expression heatmap was generated by merging the cells into bins by the order of pseudotime and visualized by the pheatmap R package. This analysis was done for both fly data and mouse data (GSE129798).
Transcription factors enrichment and SCENIC analysis. The top markers for each cluster were used as the candidates for TFs enrichment analysis. The markers were filtered by fold change > 3 and adjusted p-value < 0.05. TFs from the filtered markers were visualized by the Seurat DoHeatmap. The analysis of regulon activity was conducted using the SCENIC pipeline (2). Cells from the previously processed dataset were selected as the input cells. The TF and co-expressed genes were constructed by Cross-species analysis. Datasets included in the cross-species analysis were the processed dataset from this study for the fly, the mouse kidney dataset GSE129798 and the planarian dataset GSE111764. The analysis was conducted using the SAMap software (7). The input file for SAMap was processed using the Self-Assembling-Manifold (SAM) algorithm. Alignments for each cell type in fly and mouse were calculated by the get_mapping_scores function. Enriched gene pairs from the aligned cell types were retrieved by find_all function with a default alignment score threshold of 0.1. The SAMap results were visualized by the sankey_plot function. To assess the specificity of cluster mapping between fly Malpighian tubules and mouse kidney, the top markers of each cell type from mouse kidney data were also compared to the top markers of the FCA 252 cell clusters representing all fly tissues (1) using DRscDB (8).
Fly genetics. Fly husbandry and crosses were performed under a 12:12 hour light:dark photoperiod at 25°C. Hand-GFP; 4xHand-Gal4/CyO and Dot-Gal4 stocks are gifts from Dr. Han Zhe. esg-Gal4 and Pros-Gal4 are from the Perrimon lab stock collection.
All images presented in this study are confocal images captured with a Nikon Ti2 Spinning Disk confocal microscope. Z-stacks of 15-20 images covering one layer of the epithelium from the apical to the basal side were obtained, adjusted, and assembled using NIH Fiji (ImageJ), and shown as a maximum projection. Details of the imaging method are as follows: Samples were imaged with a Yokogawa CSU-W1 single disk (50 µm pinhole size) spinning disk confocal unit attached to a fully motorized Nikon Ti2 inverted microscope equipped with a Nikon linear-encoded motorized stage with a Mad City Labs 500 µm range Nano-Drive Z piezo insert, an Andor Zyla 4.2 plus (6.5 µm photodiode size) sCMOS camera using a Nikon Plan Apo 60x/1.4 NA DIC oil immersion objective lens with Cargille Type 37 immersion oil (cultured cells) or a Nikon Plan Apo 20x/0.75 DIC air objective lens (tissue samples). The final digital resolution of the image was 0.109 and 0.325 µm/pixel, respectively. Fluorescence from DAPI, Alexa Fluor (AF)-488, and AF555 was collected by illuminating the sample with directly modulated solid-state lasers 405 nm diode 100 mW (at the fiber tip) laser line, 488 nm diode 100 mW laser line, and 561 nm DPSS 100 mW laser line in a Toptica iChrome MLE laser combiner, respectively. A hard-coated Semrock Di01-T405/488/568/647 multi-bandpass dichroic mirror was used for all channels. Signal from each channel was acquired sequentially with hard-coated Chroma ET455/50, Chroma ET525/36 nm, and Chroma ET605/52 nm emission filters in a filter wheel placed within the scan unit, for blue, green, and red channels, respectively. Nikon Elements AR 5.02 acquisition software was used to acquire the data. 2 µm range Z-stacks, set by indicating the middle focal plane and a z-step interval of 50 µm, were acquired using piezo Z-device, with the shutter closed during axial movement. Images were acquired by collecting the entire Z-stack in each color or by acquiring each channel in each focal plane within the Z stack. Data were saved as ND2 files.
Lipid and Carbohydrate Measurements. We measured fly carbohydrates and triglycerides as described previously (9,10). To prepare fly lysates for metabolic assays, we homogenized 4 flies from each group with 300 μl PBS supplemented with 0.2% Triton X-100 and heated at 70°C for 10 min. The supernatant was collected after centrifugation at 3000g for 1 min at 4°C. 10 μl of supernatant was used for protein quantification using Bradford Reagent (Sigma, B6916-500ML). Whole-body trehalose levels were measured from 10 μl of supernatant treated with 0.2 μl trehalase (Megazyme; E-TREH) at 37°C for 30 min using glucose assay reagent (Megazyme; K-GLUC) following the manufacturer's protocol. Whole-body glycogen levels were determined from 10 μl of supernatant preincubated with 1 μl amyloglucosidase (Sigma-Aldrich; A7420) at 37°C for 30 min using glucose assay reagent (Megazyme; K-GLUC). We subtracted the amount of free glucose from the measurement and then normalized the subtracted values to protein levels in the supernatant. To measure whole body triglycerides, we processed 10 μl of supernatant using a Serum Triglyceride Determination kit (Sigma, TR0100). We subtracted the amount of free glycerol from the measurement and then normalized the subtracted values to protein levels. mRNA quantification. Total RNA was extracted from 25 Malpighian tubules using NucleoSpin RNA kit (Macherey-Nagel) and converted to cDNA using the iScript cDNA Synthesis kit (Bio-Rad). cDNAs were analyzed by quantitative PCR (qPCR) using the SYBR Green kit (Bio-Rad) and Bio-Rad CFX Manager software. rp49 were used as internal control. Each RT-qPCR was performed with three technical replicates and three biological replicates. qPCR primer pairs (forward & reverse) are shown below:

Supplementary text
Developmental trajectory analysis of principal cells. Principal cells (PCs) are mitochondria-rich and transport protons through an apical, plasma membrane vacuolar H + -ATPase (V-ATPase) (11). The main functions of principal cells are to set up a potassium gradient (12,13), which enters the cell basolaterally through the combined activity of Na + , K + -ATPase (14), inward rectifier potassium channels (15)(16)(17), and potassium cotransports (18)(19)(20). We identified six PC clusters from the scRNA-seq dataset and identified Gal4 lines that allowed us to precisely map their anatomical locations (Fig. 1A). To understand the functional differences of each PC cell cluster, we performed a GO analysis based on marker genes (Fig. S2A). Most of the GO terms refer to transport and responses to toxic substances, reflecting the main functions of the tubule. Interestingly, the top 10 terms in lower tubule PCs refer to transport, suggesting that the function of lower tubule principal cells is to transport substances between Malpighian tubules and the hemolymph (GO information is in Dataset S8).
scRNA-seq enables the exploration of the continuous differentiation trajectory of a developmental process. Thus, to analyze the developmental trajectory of principal cells, we conducted a pseudotime analysis by ordering cells along a reconstructed trajectory using Monocle3 (Fig. S2B and S2C). Consistent with the distribution distance on the UMAP, inferred trajectories demonstrated gradual transitions from cells in lower ureter principal cells, upper ureter principal cells, lower tubule principal cells, and lower segment principal cells to main segment principal cells, initial and transitional principal cellss (Fig. S2D). On the UMAP, lower segment principal cells are close to lower ureter principal cells, upper ureter principal cells, and lower tubule principal cells. The pseudotime analysis also showed that the state of lower segment principal cells is a co-mixture of lower ureter principal cells, upper ureter principal cells, and lower tubule principal cells (Fig. S2D). These results are consistent with our observation in vivo using Esyt2-Gal4 (Fig. 1C), which suggested that lower segment PCs represent a new cell cluster that is different from previously reported Alp4 expressed cells.
We chose Best2, bifid (bi), Sarcoplasmic calcium-binding protein 2 (Scp2), PDGFand VEGF-receptor related (Pvr), Uro, salty dog (salt), Alp4, SPR, and Transient receptor potential cation channel A1 (TrpA1) as representative genes for each cluster (Fig. S2E). A survey of our scRNA-seq dataset revealed that the expression of TrpA1 gradually decreased along the pseudotime, followed by increased transcription of Alp4 and SPR. The expression of Pvr, Uro and salt was elevated at the more geographical distant region of main segment principal cells, while the progressive increase of Best2, bi, and Scp2 expression was only observed in initial and transitional principal cells (Fig.  S2E). These results indicate that the patterns of expression of marker genes in each cluster are in concordance with the pseudotime analysis of the clusters.

Similarities between renal and intestinal stem cells.
Compared to a previous report that performed scRNAseq of the Malpighian tubules focusing on the stem cell zone (21), our study captured more cells and more cell types. Wang and Spradling focused on the response of renal stem cells to tissue injury. Renal stem cells were previously identified as a distinct population that expresses esg (22). Renal stem cells are located in the lower ureter, the upper ureter and lower segment of the Malpighian tubule with small nuclei (23). They respond to tissue injury by upregulating the JNK, EGFR/MAPK, Hippo/Yki and JAK/STAT pathways that promote renal stem cells daughter differentiation (21). Renal stem cells originate from the same pool of adult midgut progenitors that generate the posterior midgut intestinal stem cells (23,24). To examine how similar renal stem cells are to intestinal stem cells, we compared the snRNA-seq renal stem cells data with previously reported scRNAseq intestinal stem cells data (25). Consistent with their common origin (23), the two stem cell clusters have high similarity at the gene expression level compared to other cell clusters (Fig.  S11A).
The esg gene, a stem cell marker for both renal stem cells and intestinal stem cells ( Fig. S11B and 11C), encodes a transcription factor that contributes to stem cell maintenance through modulation of Notch activity (26). In the intestine, esg is not only expressed in intestinal stem cells but also in AstC-EEs (enteroendocrine cells that express Allatostatin C, AstC) and NPF-EEs (EEs that express neuropeptide F, NPF) (25). In the intestine, ISCs are highly mitotic, especially during regeneration, and give rise to a transient progenitor, the enteroblast (EB) (27,28), whereas in the Malpighian tubule, renal stem cells normally divide very slowly (19). In our previous gut scRNAseq study, ISCs/EBs were annotated as one cluster based on the expression of Dl and esg. However, this cluster could be split into ISCs and EBs, as one subset of cells in the ISC/EB cluster is Dl + klu − and another subset is Dl − klu + (25). Interestingly, we could also identify two renal stem cells sub-clusters based on the expression of Dl + klu − and Dl − klu + (Fig. S11D and S11E). The Dl + klu − sub-cluster specifically expresses Dl, N, and esg, reminiscent to the Dl + klu − sub-cluster of ISC (Fig. S11F), whereas the Dl − klu + sub-cluster specifically expresses E(spl)m3-HLH, E(spl)malpha-BFM, E(spl)mbeta-HLH, which are transcription factors executing Notch-mediated cellular differentiation (29, 30, Fig. S11F).

Cell-cell communication networks in the fly kidney.
Previous studies have indicated that the survival, renewal, and differentiation of principal cells (PCs) and stellate cells (SCs) are largely regulated through their cross-talk with renal stem cells (RSCs) (21)(22)(23). We used FlyPhoneDB (Liu et al, 2021) to explore cell-cell communication between the different fly kidney cell clusters. FlyPhoneDB was established recently and provides predictions of ligand-receptor interactions based on fly scRNA-seq data. We analyzed 13 major pathways and indicated their cell-cell interaction pairs between the different cell clusters (Fig.S12A). Strikingly, the Notch ligand only has interaction within RSCs and does not pair with other cell clusters (Fig.  S12A). This is consistent with previous studies showing that differential Notch activity is required for RSC homeostasis and that damage activates Notch signaling, which in turn regulates differentiation of RSCs to PCs (21,31). Further, we found that the EGFR signaling pathway connects RSCs and all SCs and PCs, with a preferentially strong interaction with main segment SCs and main segment PCs (Fig. S12A). These are consistent with previous studies showing that EGFR is dispensable for RSC maintenance but required for RSC proliferation (32). In addition, FlyPhoneDB predicts a strong interaction from main segment SCs to main segment PCs with the Pvf1-Pvr ligand-receptor pair (Fig. S12B). This interaction is based on the gene expression pattern in cell clusters of main segment SCs and PCs via MIST database and TF2TG literatures (Fig. S12C). Consistent with this, Pvf1 was highly expressed in main segment SCs as detected using Pvf1-Gal4>mCD8::GFP (Fig. S12D). Altogether, FlyPhoneDB predicts a number of specific signaling events between Malpighian tubule cell clusters. The full list of predicted gene pairs can be found in Dataset S9.
Metabolic pathway analysis. The basic functions of mammalian kidneys include metabolism of carbohydrates, proteins, lipids and other nutrients. As in mammalian kidneys, insect Malpighian tubules and nephrocytes play an essential role in the maintenance of ionic, acid-base and water balance, and elimination of metabolic and foreign toxins and homeostasis. To further understand metabolism in the fly kidney, we analyzed the KEGG metabolic pathways in UMAP of fly kidney snRNA-seq using AUCell software (2). Among 86 KEGG metabolic pathways, purine metabolism, glycerophospholipid metabolism, nicotinate and nicotinamide metabolism, starch and sucrose metabolism were more expressed than other pathways (Fig. S12A). A statistical significance analysis showed that 38 pathways were specifically expressed in one or two cell types (Fig. S12B). For examples, caffeine metabolism was specifically expressed in the main segment principal cells; nitrogen metabolism was specifically expressed in the main segment stellate cells; arginine and proline metabolism was specifically expressed in the lower segment principal cells and lower ureter principal cells; and folate biosynthesis was specifically expressed in the lower segment principal cells while thiamine metabolism was found in the upper ureter principal cells (Fig. S12C). Regarding purine metabolism, Xanthine oxidation is a necessary step in the catabolic pathway for purines toward urate, allantoin and urea. Dysfunction of xanthine oxidase/dehydrogenase (XO/XDH) causes a build-up of high levels of xanthine and hypoxanthine forming stones in humans and flies (33)(34)(35). Human ancestors lost the ability to synthesize a functional urate oxidase due to multiple point mutations in the Uro gene, resulting in increased serum and urinary uric acid (UA) levels (36). In the UA pathway, humans and flies share some of the same steps. The product of the fly Uro gene catalyzes formation of allantoin from UA (Fig.  S13A). Most UA pathway genes are enriched in fly kidney cells, specifically in principal cells ( Fig. S13B and C). The enzymes that control the last three steps, which are encoded by rosy (ry), Uro and CG30016, are highly enriched in main segment principal cells, suggesting that the last step occurs in this region (Fig. S13D). ry is the homolog of human XDH and loss of function of ry is associated with bloating in the lower tubules and formation of stones (37). Metabolomic analysis of ry mutants showed significant changes up to five metabolites away from the metabolic lesion, with large increases in levels of hypoxanthine and xanthine, and undetectable levels of the downstream metabolite UA (38). The product of CG30016 is predicted to have hydroxyisourate hydrolase activity and to be involved in purine nucleobase metabolism. It will be interesting to determine whether this gene also plays a role in maintaining fly urate levels.  (1). Right, annotation of the same data set at Leiden resolution 0.6 (this study). The FCA analysis reports four clusters for principal cells: lower ureter PC, lower segment PC, principal cell, and initial segment PC. In this study, we defined six clusters for principal cells based on Gal4 reporter lines: lower ureter PC, lower ureter PC, lower tubule PC, lower segment PC, main segment PC, and initial segment PC. Note that this figure contains all the original clusters, including non-Malpighian tubule cell clusters (salivary glands, artefacts), which we did not include in Fig. 1A.        The upper panel is for highly/ubiquitously expressed metabolic pathways in the mouse kidney, which were also highly expressed in the fly kidney. Lower panel is for five specifically expressed pathways in 1-2 cell clusters in the mouse kidney that are consistent with the fly data. Figure S10. Expression of fly orthologs of human kidney disease-associated genes in specific fly kidney cell types. Average expression in single cell clusters of fly orthologs of human monogenic disease genes and complex-trait genes identified from genome-wide association studies (GWAS) (39). Mean expression values of the genes were calculated for each cluster. The color scheme is based on z-score distribution (-3 < z-scores < 3). Each row in the heat map represents one gene and each column a single cell type.